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Abstract 

Gaussian Process (GP) models are often used as mathematical ap- 
proximations of computationally expensive experiments. Provided that 
its kernel is suitably chosen and that enough data is available to obtain 
a reasonable fit of the simulator, a GP model can beneficially be used 
for tasks such as prediction, optimization, or Monte-Carlo-based quantifi- 
cation of uncertainty. However, the former conditions become unrealistic 
when using classical GPs as the dimension of input increases. One popular 
alternative is then to turn to Generalized Additive Models (GAMs), re- 
lying on the assumption that the simulator's response can approximately 
be decomposed as a sum of univariate functions. If such an approach has 
been successfully applied in approximation, it is nevertheless not com- 
pletely compatible with the GP framework and its versatile applications. 
The ambition of the present work is to give an insight into the use of 
GPs for additive models by integrating additivity within the kernel, and 
proposing a parsimonious numerical method for data-driven parameter 
estimation. The first part of this article deals with the kernels naturally 
associated to additive processes and the properties of the GP models based 
on such kernels. The second part is dedicated to a numerical procedure 
based on relaxation for additive kernel parameter estimation. Finally, the 
efficiency of the proposed method is illustrated and compared to other 
approaches on Sobol's g-function. 
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1 Introduction 



The study of numerical simulators often deals with calculation intensive com- 
puter codes. This cost implies that the number of evaluations of the numerical 
simulator is limited and thus many methods such as uncertainty propagation, 
sensitivity analysis, or global optimization are unafFordable. A well known ap- 
proach to circumvent time limitations is to replace the numerical simulator by 
a mathematical approximation called metamodel (or response surface or sur- 
rogate model) based on the responses of the simulator for a limited number 
of inputs called the Design of Experiments (DoE). There is a large number of 
metamodels types and among the most popular we can cite regression, splines, 
neural networks... In this article, we focus on a particular type of metamodel: 
the Kriging method, more recently referred to as Gaussian Process modeling 
Originally presented in spatial statistics [2] as an optimal Linear Unbi- 
ased Predictor (LUP) of random processes, Kriging has become very popular 
in machine learning, where its interpretation is usually restricted to the conve- 
nient framework of Gaussian Processes (GP). Beyond the LUP — which then 
elegantly coincides with a conditional expectation — , the latter GP interpreta- 
tion allows indeed the explicit derivation of conditional probability distributions 
for the response values at any point or set of points in the input space. 

The classical Kriging method faces two issues when the number of dimen- 
sions d of the input space D C M."^ becomes high. Since this method is based 
on neighborhoods, it requires an increasing number of points in the DoE to 
cover the domain D. The second issue is that the number of anisotropic kernel 
parameters to be estimated increases with d so that the estimation becomes par- 
ticularly difficult for high dimensional input spaces [31 13] . An approach to get 
around the first issue is to consider specific features lowering complexity such 
as the family of Additive Models. In this case, the response can approximately 
be decomposed as a sum of univariate functions: 



where /i £ K and the /^'s may be non-linear. Since their introduction by Stones 
in 1985 [5,, many methods have been proposed for the estimation of additive 
models. We can cite the method of marginal integration [B] and a very popular 
method described by Hastie and Tibshirani in [H [7] : the GAM backfitting algo- 
rithm. However, those methods do not consider the probabilistic framework of 
GP modeling and do not usually provide additional information such as the pre- 
diction variance. Combining the high-dimensional advantages of GAMs with the 
versatility of GPs is the main goal of the present work. For the study functions 
that contain an additive part plus a limited number of interactions, an extension 
of the present work can be found in the recent paper of T. Muehlenstaedt [1]. 

The first part of this paper focuses on the case of additive Gaussian Pro- 
cesses, their associated kernels and the properties of associated additive kriging 
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models. The second part deals with a Relaxed Likelihood Maximization (RLM) 
procedure for the estimation of kernel parameters for Additive Kriging models. 
Finally, the proposed algorithm is compared with existing methods on a well 
known test function: the Sobol's g-function [5]. It is shown within the latter 
example that Additive Kriging with RLM outperforms standard Kriging and 
produce similar performances as GAM. Due to its approximation performance 
and its built-in probabilistic framework both demonstrated later in this arti- 
cle, the proposed Additive Kriging model appears as a serious and promising 
challenger among additive models. 

2 Towards Additive Kriging 

2.1 Additive random processes 

Lets first introduce the mathematical construction of an additive GP. A function 
/ : _D C M"* — > M is additive when it can be written /(x) = X]f=i fi{^i)i where 
Xi is the i-th component of the d-dimensional input vector x and the fi's are 
arbitrary univariate functions. Let us first consider two independent real-valued 
first order stationary processes Zi and Z2 defined over the same probability 
space (fi, T, P) and indexed by M, so that their trajectories Zi{.;Ld) : x G M. — >■ 
Zi{x;uj) are univariate real- valued functions. Let i^T j : K x M — 5- M be their 
respective covariance kernels and /ii,/i2 G ^ their means. Then, the process 
Z := Zi + Z2 defined over {fl, T , P) and indexed by R^, and so that 

Vw e 17 Vx G Z{x;uj) = Zi{xi;uj) + Z2{x2;uj), (2) 

has mean fJ- = fJ-i + fJ-2 and kernel -ft'(x, y) = Ki{xi, yi) + K2{x2, 1/2) ■ Following 
equation[2l the latter sum process clearly has additive paths. In this document, 
we call additive any kernel of the form K : (x, y) £ R"* x K'^ — > K{x,y) = 
^f^i Ki{xi,yi) where the if^'s are semi-positive definite (s.p.d.) symmetric 
kernels over R x R. Although not commonly encountered in practice, it is well 
known that such a combination of s.p.d kernels is also a s.p.d. kernel in the 
direct sum space [llj. Moreover, one can show that the paths of any random 
process with additive kernel are additive in a certain sens: 

Proposition 1. Any (square integrable) random process Zx possessing an ad- 
ditive kernel is additive up to a modification. In essence, it means that there 
exists a process which paths are all additive, and such that Vx G X, P(Zx = 

^x) = l. 

The proof of this property is given in appendix for d — 2. For d = n 
the proof follows the same pattern but the notations are more cumbersome. 
Note that the class of additive processes is not actually limited to processes 
with additive kernels. For example, let us consider Zi and Z2 two correlated 
Gaussian processes on {ft, T , P) such that the couple {Zi, Z2) is Gaussian. Then 
Zi{xi) + Z2{x2) is also a Gaussian process with additive paths but its kernel 
is not additive. However, in the next section, the term additive process will 
always refer to GP with additive kernels. 
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2.2 Invertibility of covariance matrices 

In practice, the covariance matrix K of the observations of an additive process 
Z at a design of experiments X — (x^^^ . . . x*^"^)-^ may not be invertible even 
if there is no redundant point in X . Indeed, the additivity of Z may introduce 
hnear relationships (that holds almost surely) between the observed values of Z 
and lead to the non invertibility of K. Figure [1] shows two examples of designs 
leading to a linear relationship between the observation. For the left panel, the 
additivity of Z implies that Z(x('*)) = ^(x^^)) + Z(x(3)) - Z(x(i)) and thus 
the fourth column of the covariance matrix is a linear combination of the three 
other columns: if (xW,x(4)) = if(x«,x(2)) + X(xW,x(3)) - ii:(xW,x(i)) and 
the associated covariance matrix is not invertible. 
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Figure 1: 2-dimensional examples of DoE which lead to non- invertible covariance 
matrix in the case of random processes with additive kernels. 



A first approach is to remove some points in order to avoid any linear com- 
bination, which is furthermore in accordance with the aim of parsimonious eval- 
uations for costly simulators. Algebraic methods may be used for determining 
the combination of points leading to a linear relationship between the values of 
the random process but this procedure is out of the scope of this paper. 



2.3 Additive Kriging 

Let z : Z) — > K. be the function of interest (a numerical simulator for ex- 
ample), where Z) C M''. The responses of z at the DoE X are noted Z = 
(z(x(i)) ... z(x(")))^. Simple kriging relies on the hypothesis that z is one path 
of a centered random process Z with kernel K. The expression of the best 
predictor (also called kriging mean) and of the prediction variance are : 

m(x) = fc(x)^K-iZ 
w(x) = K{^, x) - fc(x)'^K"ifc(x) 

where A;(x) = (i4r(x, x*^^') ... i4r(x, x*^"'))^ and K is the covariance matrix 
of general term K^j = if (x'^^x^^^). Note that these equations respectively 



4 



correspond to the conditional expectation and variance in the case of a GP with 
known kernel. In practice, the structure of k is supposed known (e.g. power- 
exponential or Matern families) but its parameters are unknown. A common 
way to estimate them is to maximize the likelihood of the kernel parameters 
given the observations Z [TU |TT] . 

Equations 13] are valid for any s.p.d kernel, thus they can be applied with additive 
kernels. In this case, the additivity of the kernel implies the additivity of the 
kriging mean: for example in dimension 2, for K(jx., y) = Ki(xi, yi)+K2{x2, 2/2) 
we have 

m(x) ^ {h{xi) + k2ix2)f{Ki+K2)-'Z 

= ki{xif{Ki + K2)-^Z + k2{x2f{Ki + K2)-^Z (4) 
= mi{xi) + 7712(2:2). 

Another interesting property concerns the variance: v can be null at points that 
do not belong to the DoE. Let us consider a two dimensional example where 
the DoE is composed of the 3 points represented on the left pannel of figure [TJ 
X = {x*^^) x^^) x*^'^)}. Direct calculations presented in Appendix B shows that 
the prediction variance at the point x^**) is equal to 0. This particularity follows 
from the fact that the value of the additive process are known almost surely at 
the point a:^'*-' based on the observations at X. In the next section, we illustrate 
the potential of Additive Kriging on an example and propose an algorithm for 
parameter estimation. 

2.4 Illustration and further consideration on a 2D exam- 
ple 

We present here a first basic example of an additive kriging model. We consider 
D = [0,1]^, and a set of 5 points in D where the value of the observations F are 
arbitrarily chosen. Figure [2] shows the obtained kriging model. We can see on 
this figure the properties we mentioned above: the kriging mean is an additive 
function and the prediction variance can be null for points that do no belong to 
the DoE. 

The effect of any variable can be isolated and represented so as the meta- 
model can be split in a sum of univariate sub-models. Moreover, we can get 
confidence intervals for each univariate model. As the expression of the first 
univariate model is 

m,{x,) = k,{xif{K,+K2)-'F (5) 

the effect of the direction 2 can be seen as an observation noise. We thus get 
an expression for the prediction variance of the first main effect 

v,{x,) = K,{x,,x,) - fci(a;i)^(Ki + K2)-^k,{x,). (6) 

The left panel of figure[3]shows the obtained sub-model for the first direction. 
The interest of such graphic is limited since a 2-dimensional function can be 
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Figure 2: Approximation of the function / based on five observations (black 
dots). The left panel represents the best predictor and the right panel the pre- 
diction variance, the kernel used is the additive gaussian kernel with parameters 
C7 = (1 1) and 6 = (0.6 0.6). 
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Figure 3: Univariate models of the 2-dimensional example. The left panel plots 
mi and the 95% confidence intervals ci{xi) = mi(xi) ± 2y^vi{xi). The right 
panel shows the sub- model of the centrated univariate effects ml and cl{xi) = 

plotted but this decomposition becomes useful to get an insight on the effect 
of a variable when d > 2. However, we can see that the confidence intervals 
are wide. This is because the sub-models are define up to a constant. In 
order to get rid of the effect of such a translation, an option is to approximate 
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Zi{xi) — J Zi{si)dsi conditionally to the observations: 

m*ix,) - E Z,{x,) - J Z,{s,)ds, Z{X) = Y 
v*{xi) = var Zi{xi) - / Z^{si)dsi Z{X) = Y 



(7) 



The expression of m*{xi) is straightforward whereas v*{xi) requires more cal- 
culations which are given in Appendix C. 



m*{xi) = m^{xi) - J mi{si)dsi 

V* (xi) ^ Vi{xi) - 2 J Ki{xi,Si)dsi + 2 J ki{xi)'^ K~^ki{si)dsi (8) 
+ // K,{s„U)dsidti ~ // h{tifK-^h{si)ds,dU 



The benefits of using m* and v* and then to define the sub-models up to a 
constant can be seen on the right panel of figure |3l At the end, the probabilistic 
framework gives an insight on the error of the metamodel but also of each sub- 
model. 



3 Parameter estimation 

3.1 Maximum likelihood estimation (MLE) 

MLE is a standard way to estimate covariance parameters and it is covered in 
detail in the literature [TTJ [T3] . Let F be a centered additive Process and V'i = 
{erf , 0i} with i G {1, . . . , d} the parameters of the univariate kernels. According 
to the MLE methodology, the best values tjj* for the parameters ipi are the 
values maximizing the likelihood of the observations Y: 

where K(ip) — Ki(V'i) + • ■ • + K.d{ipd) is the covariance matrix depending on the 
parameters ipi. The latter maximization problem is equivalent to the usually 
preferred minimization of 

/(^i, . . . , Vd) log(det(K(V))) + Y^K(^)-iY (10) 

Obtaining the optimal parameters ip* relies on the succesful use of a non-convex 
global optimization routine. This can be severely hindered for large values of d 
since the search space of kernel parameters becomes high dimensional. One way 
to cope with this issue is to separate the variables and split the optimization 
into several low-dimensional subproblems. 
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3.2 The Relaxed Likelihood Maximization algorithm 

The aim of the Relaxed Likehhood Maximization (RLM) algorithm is to treat 
separately the optimization in each direction. In this way, RLM can be seen 
as a cyclic relaxation optimization procedure |14| with initial values of the pa- 
rameters af set to zero. As we will see, the main originality here is to consider 
a kriging model with an observation noise variance that fluctuates during 
the optimization. This parameter account for the metamodel error (if the func- 
tion is not additive for example) but also for the inaccuracy of the intermediate 
values of ai and Oi. 

The first step of the algorithm is to estimate the parameters of the kernel Ki. 
The simplification of the method is to consider that all the variations of Y in 
the other directions can be summed up as a white noise. Under this hypothesis, 
I depends on ■01 and r: 

^(V'l, r) = log(det(Ki(Vi) + r^/rf)) + {K^{^i) + r'^hY^Y (11) 

Then, the couple {V"!, ■''*} that maximizes /(V"!, t) can be obtained by numerical 
optimization. 

The second step of the algorithm consists in estimating %l)2 , with fixed to ifil : 

{V'2,'''*} =argmax(Z('0i, ^^2, t)), with 

liri,^2,T) =log(det(Ki(0*)+K2(V'2) + T2/<i))+ (12) 
Y^(Ki(V'i*) + K2(^2) + r^Id)-^Y 

This operation can be repeated for all the directions until the estimation of 
tpd- However, even if all the parameters ipi have been estimated, it is fruitful to 
re-estimate them such that the estimation of the parameter ipi can benefit of the 
values ip* for j > i. Thus, the algorithm is composed of a cycle of estimations 
that treat each direction one after each other: 

RLM Algorithm : 

1. Initialize the values af^ = for i G {1, . . . , d} 

2. For k from 1 to number of iteration do 

3. For I from 1 to cZ do 

4. rC^)} = argmin(/,(Vf \ ■ • ■ , ^S, V-;, • ■ • , 

5. End For 

6. End For 

T is a parameter tuning the fidelity of the metamodel since for t — the 
kriging mean interpolates the data. In practice, this parameter is decreasing at 
almost each new estimation. Depending on the observations and on the DoE, 
T converges either to a constant or to zero (cf. the g-function example and 
figure [6]) . When zero is not reached, should correspond to the part of the 
variance that cannot be explained by the additive model. Thus, the comparison 
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between and the allows us to quantify the degree of additivity of the 
objective function according to the metamodel. 

This procedure of estimation is not meant to be applied for kernels that are not 
additive. The method developed by Welch for tensor product kernels in [TU] has 
similarities since it corresponds to a sequential estimation of the parameters. 
One interesting feature of Welch's algorithm is to choose at each step the best 
search direction for the parameters. The RLM algorithm could easily be adapted 
in a similar way to improve the quality of the results but the corresponding 
adapted version would be much more time consuming. 

4 Comparison between the optimization's meth- 
ods 

The aim of this section is to compare the RLM algorithm to the Usual Like- 
lihood Maximization (ULM). The test functions that are considered are paths 
of an additive GP Y with Gaussian additive kernel K. For this example, the 
parameters of K are fixed to ai = 1, di = 0.2 for z G 1 . . .d but those values are 
supposed to be unknown. 

Here, 2 X d + 1 parameters have to be estimated: d for the variances, d for 
the range and 1 for the noise variance t^. For ULM, they are estimated si- 
multaneously, whereas the RLM is a 3-dimensional optimization at each step. 
In both cases, we use the L-BFGS-B method of the function optim with the R 
software. To judge the effectiveness of the algorithms, we compare here the best 
value found for the log-likelihood I to the computational budget (the number 
of call to I) required for the optimization. As the optim function returns the 
number nc of call to I and the best value bv at the end of each optimization, 
we obtain for the MLE on one path of Y one value of nc and bv for ULM and 
nbjiteration x d values of nv and bv for RLM since there is one optimization at 
each step of each iteration. 

The panel (a) of figure |4] presents the results for the two optimizations on a 
path of a GP for d = 5. On this example, we can see that ULM needs 1500 calls 
to the log-likelihood before convergence whereas RLM requires much more calls 
before convergence. However, the result of the two methods are similar for 1500 
calls but the result of RLM after 5000 calls is substantially improved. In order 
to get more robust results we simulate 20 paths of Y and we observe the global 
distribution of the variables nv and bv. Furthermore, we study the evolution 
of the algorithm performances when the dimension increases choosing various 
values for the parameter d from 3 to 18 with a Latin Hypercube (LH) Design 
with maximin criteria [13] containing 10 x d points. We observe on the panels 
(b), (c) and (d) of figure |4] that optimization with the RLM requires more calls 
to the function I, but this method appears to be more efficient and robust than 
ULM. Those results are stressed by figure [H] where the final best value of RLM 
and ULM are compared. This figure also shows that the advantage of using 
RLM comes bigger when d is getting larger. 
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5 Application to the g-function of Sobol 



In order to illustrate the methodology and to compare it to existing algorithms, 
an analytical test case is considered. The function to approximate is the g- 
function of Sobol defined over [0, 1]'' by 



5(x) = n ^'^'^^ with a, > (13) 

This popular function in the literature [9] is obviously not additive. However, 
depending on the coefficients afc, g can be very close to an additive function. 
As a rule, the g-function is all the more additive as the ak are large. One main 
advantage for our study is that the Sobol sensitivity indices can be obtained 
analytically so we can quantify the degree of additivity of the test function. For 
i = 1, . . . ,d the indice Si associated to the variables Xi is 



S, = ^_ ^(^+°')^ , . (14) 



3(l+afc)2 



- 1 



Here we limit ourselves to the case d = 4 and following [16] we choose at = k 
for k G {1, 4}. For this combination of parameters, the sum of the first order 
Sobol indices is 0.95 so the g-function is almost additive. The considered DoE 
are LH maximin designs based on 40 points. To asses the quality of the obtained 
metamodels, the predictivity coefficient Q2 is computed on a test sample of 
n — 1000 points uniformly distributed over [0, 1]^. Its expression is: 

where y is the vector of the values at the test points, y is the vector of predicted 
values and y is the mean of y. 

We run on this example 5 iterations of the RLM algorithm with kernel Matern 
3/2. The evolution of the estimated observation noise is represented on 
figure [HI On this figure, it appears that the observation noise is decreasing 
as the estimation of the parameters is improved. Here, the convergence of 
the algorithm is reached at iteration 4. The overall quality of the constructed 
metamodel is high since Q2 = 0-91 and the final value for is 0.01. 

As previously the expression of the univariate sub-metamodels is 

mr{x,) = h{x,f{Ki -t- K2 + K3 -f K4)-iY (16) 

The univariate functions obtained are presented on figure [T] The confidence 
intervals are not represented here in order to enhance the readability of the 
graphics and the represented values are centered to ensure that the observations 
and the univariate functions are comparable. 
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As the value of Q2 is likely to fluctuate with the DoE and the optimization 
performances, we compare here the proposed RLM algorithm with other meth- 
ods for 20 different LHS. The other methods used for the test are (a) additive 
kriging model with ULM, (b) kriging with usual tensor-product kernel, (c) the 
GAM algorithm. The results for classical kriging and GAM are obtained with 
the DiceKrigin^ |T7] and the GAM packages for R available on the GRAN [TS] . 
As the value of the are the same as in [16] where Marrel et al. presents a 
specific algorithm for sequential parameter estimation in non-additive kriging 
models, the results of this paper are also presented as method (d). The mean 
and the standard deviation of the obtained Q2 are gathered in table [T] 



Algorithm 


kernel 


mcan((52) 


sd(g2) 


RLM 


Additive Matern 3/2 


0.90 


0.016 


ULM 


Additive Matern 3/2 


0.88 


0.037 


Standart Kriging 


Matern 3/2 


0.82 


0.042 


GAM 


(smoothing splines) 


0.90 


0.021 


Marrel 


power-exponential 


0.86 


0.07 



Table 1: Q2 predictivity coefficients at a 1000-points test sample for the various 
methods. 



6 Concluding remarks 

The proposed methodology seems to be a good challenger for additive model- 
ing. On the example of the GP paths, the RLM appears to be more efficient 
than usual likelihood maximization and well suited for high dimensional model- 
ing. On the second example, additive models benefit of the important additive 
component of the g-function and outperform non additive models even if the 
function is not purely additive. The predictivity of the RLM is equivalent to 
that of GAM but its robustness is higher for this example. 

One main difference between RLM and GAM backfitting is that RLM takes 
into account the estimated parameters into the covariance structure whereas 
GAM subtracts from the observation the predicted value for all the sub-models 
obtained in the other directions. 

We would like to stress how the proposed metamodels take advantage of addi- 
tivity, while benefiting from GP features. For the first point we can cite the 
complexity reduction and the interpretability. For the second, the main asset 
is that probabilistic metamodels provide the prediction variance. This justifies 
the fact of modeling an additive function on R'^ instead of building d metamod- 
els over R since the prediction variance is not additive. We can also note the 
opportunity to choose a kernel adapted to the function to approximate. 

*As for RLM and ULM, DiceKriging also use the BFGS algorithm for the likelihood max- 
imization 
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At the end, the proposed methodology is fully compatible with Kriging-based 
methods and its versatile applications. For example, one can choose a well 
suited kernel for the function to approximate or use additive kriging for high- 
dimensional optimization strategies relying on the expecting improvement cri- 
teria. 
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Appendix A: Proof of proposition [1] for d = 2 

Let Z he a random process indexed by with kernel K{x, y) = Ki{xi,yi) + 
-K^2 (2^2 5 2/2 ) , and Zt the random process defined by Zt{xi,X2) = Z{xi,0) + 
Z{0,X2) — Z{0,0). By construction, the paths of Zt are additive functions. 
In order to show the additivity of the paths of Z, we will show that € 
R^, P(Z(x) = 2't(x)) = 1. For the sake of simplicity, the three terms of 
var[.Z(x) - ^t(x)] = var[Z(x)] + var[ZT(x)] - 2cov[Z(x), Zt(x)] are studied 
separately: 

var[Z(x)] = X(x,x) 

var[ZT(x)] = var[Z(a;i, 0) + Z(0, X2) - Z{0, 0)] 

= var[Z(a;i, 0)] + var[Z(0, X2)] + 2cov[Z(a;i, 0), Z{0, X2)] 

+ var[Z(0,0)] - 2cov[Z(a;i, 0), Z(0, 0)] - 2cov[Z(0, 2:2), ^(0, 0)] 
= Ki{xi,Xi) + K2{0, 0) + Ki{0, 0) + K2{X2,X2) + K{0, 0) 

+ 2{KiixuO) + K2iO,X2)) - 2iKiixi,0) + K2i0,0)) 

-2{Ki{0,0)+K2{x2,0)) 
= Ki{xi,xi) + K2(x2,X2) = A'(x,x) 

cov[Z(x), Zt(x)] = cov[Z(xi, 2:2), Z{xi,Q) + Z(0, X2) - Z(0, 0)] 

= Ki{xi,Xi) + K2{X2,Q) + Ki{xi,Q) + K2{X2,X2) 

- Ki{xi,0) - K2{x2,0) 
= Ki{xi,xi) + K2{x2,X2) = Ar(x,x) 

Those three equations implies that var[Z(x) — Zt(x)] = 0, Vx e R^. Thus, 
P(Z(x) = Zt{x)) = 1 and there exists a modification of Z with additive paths. 
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Appendix B: Calculation of the prediction vari- 
ance 

Let consider a DoE composed of the 3 points {x^^^ x^^^ x^^^^} represented on 
the left pannel of figure [TJ We want here to show that although x*^^' ) does not 
belongs to the DoE we have v{x.^^^) = 0. 

i;(xW) i^(xW,xW)-A;(xWfK-iA;(xW) 

- X(x(4),x(4)) - (fc(x(2)) + fc(x(3)) _ fc(x(i)))^K-ifc(x(4)) 

= i^,(xl^\x(^))+i^,(x(^\xrV 



(-1 1 l)|i^,(xi^\xr)+i^2(x(^\xW^ 
,i^,(xf\x(^))+i^.(xr\xrV 
ifi(xf xf )) + i^,(x(^\ xf ) - i^i(x(^\ x(^) 



Appendix C: Calculation of v* 

We want here to calculate the variance of Zi(xi) — J Zi(si)dsi conditionally to 
the observations Y. 



v*{xi) — var 



Zi{xi) - / Zi{si)dsi 



Z{X) = Y 



: var [Z,{x,)\ Z{X) =Y]- 2cov Z,{x^), j Z,{s.,)ds, Z(X) = Y 

+ var y Z,(s.,)ds, Z{X) = Y 
Vi(xi) - 2 Si)dsi - ^ ki{xi)'^ K~^ki{si)ds 

+ 11 K,{s„U)ds,dU - // h{U)'^K-^h{s,)ds,dU 
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(a) d = 10 



(b) d = 3 
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(c) d = 9 



(d) d = 15 



Figure 4: Comparison of the optimization methods. The solid red dots are for 
usual optimizations and the black points are for the RLM method. For (a), the 
vertical lines correspond to the limit between two iterations of the algorithm. 
For (b), (c) and (d), the boxplots are based on 20 paths of Y and each one sum 
up the best values of I for a given range of number of call. 
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ULM RLM ULM RLM ULM RLM ULM RLM 

(a) d = 3 (b) d = 6 (c) d = 12 (d) d = 18 

Figure 5: Comparison of the final value for the log- likelihood with the two 
optimization algorithms for 20 paths of Y. 




1 2 3 4 5 

iteration 

Figure 6: Evolution of the observation noise on the 4-dimensional example 
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Figure 7: l-dimensional projections of the observations (bullets) on the g- 
function example for d = 4. The univariate models (solid lines) obtained after 5 
iterations of RLM are very closed to the analytical main effects (dashed lines). 
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